Hirsch index as a network centrality measure

نویسندگان

  • Monica G. Campiteli
  • Adriano J. Holanda
  • Paulo R. C. Soles
  • Leonardo H. D. Soares
  • Osame Kinouchi
چکیده

We study the h Hirsch index as a local node centrality measure for complex networks in general. The h index is compared with the Degree centrality (a local measure), the Betweenness and Eigenvector centralities (two non-local measures) in the case of a biological network (Yeast interaction protein-protein network) and a linguistic network (Moby Thesaurus II). In both networks, the Hirsch index has poor correlation with Betweenness centrality but correlates well with Eigenvector centrality, specially for the more important nodes that are relevant for ranking purposes, say in Search Engine Optimization. In the thesaurus network, the h index seems even to outperform the Eigenvector centrality measure as evaluated by simple linguistic criteria. PACS numbers: 89.75.-k,64.60.aq,01.30.-y ar X iv :1 00 5. 48 03 v2 [ ph ys ic s. so cph ] 2 7 Ju n 20 10 Hirsch index as a network centrality measure 2 The Hirsch or h index has been proposed and mainly studied as a scientific productivity statistics, being applied to individual researchers [1, 2, 3, 4, 5], groups [6], journals [7, 8] and countries [9] using data from citation networks. In this context, a researcher has Hirsch index h if he/she has h papers with at least h citations each [1]. Recently, Korn et. al [10] have proposed a general index to network node centrality based on the h-index. Korn et al. named it as the lobby index, but since it is simply the application of Hirsch’s idea in the context of general networks, we shall continue to call it the Hirsch (centrality) index. Korn et al. argue that the proposed index contains a mix of properties of other well known centrality measures. However, they have studied it mostly in the context of artificial or idealized networks like the Barabasi-Albert model [10, 11]. Here we study the Hirsch index in two real life networks and discuss some computational and conceptual advantages of the h-index as a new centrality measure. We study the Hirsch centrality in linguistic and biological networks already considered by the physics community. The first one is the Moby Thesaurus II network [12, 13, 14] composed by 30, 260 nodes and around 1.7 million links. The biological network is the yeast protein-protein network obtained from the Biogrid repository [15]. This is a curated repository for physical and genetic interactions for 5, 433 proteins and over 150, 000 unambiguous interactions. The advantage of using these networks as benchmarks to evaluate node centrality is that they enable us not only a comparison with other centrality indexes but also an independent evaluation by standard linguistic/biological criteria. In this work we use the following definition: The Hirsch centrality index h of a node is the largest integer h such that the node has at least h neighbours which have a degree of at least h. The linguistic network is formed by the entries or ”root words” of the Moby Thesaurus II [12]. To construct the network we use the convention that an outlink goes from a root word to a related word. The raw thesaurus have over 2.5 million links but there are many words with only in-links (that is, they are not root words). So, we worked with a cleansed version with around 1.7 million links where only root words constitute nodes [13, 14]. The minimal number of outlinks is 17 and the maximum is 1106. Notice that the graph is directed [14] but we have used as centrality measures the out-degree and the h-index based on the out-degree (from here referred simply as ”degree” D). The biological network as downloaded from the Biogrid is composed by gene products connected by a link[15]. The links include direct physical binding of two proteins, co-existence in a stable complex or genetic interaction as given by one or several experiments described in the literature. The links are considered here undirected. The degree distribution ranges from 1 to 1, 975. In Figure 1 we present dispersion plots of the h-index versus degree D for the two networks. We notice that, although correlated, the two measures are not redundant. Hirsch index as a network centrality measure 3 More importantly, in the thesaurus case the h-index highlights false positives (defined in terms of their degree centrality), that is, words with high degree but low h. We also note that, by definition, a node cannot have h > D, and that the boundary h = D is saturated in the low D regime (up D = 100). For higher D, we observe in both networks that the highest h are proportional to D, but the origin of this anomalous exponent is not clear. Figure 1. Log-log dispersion plot of h versus Degree centrality D for a) Moby Thesaurus II and b) Yeast network. Now we compare the Hirsch index with two standard non-local centrality indexes, Betweenness and Eigenvector centrality. First, we present in Figure 2 the dispersion plots of h versus Betweenness centrality B. No strong correlation is apparent, meaning that the indexes seems to contemplate different ideas of centrality (this will be discussed better bellow). Hirsch index as a network centrality measure 4 Figure 2. Log-log dispersion plot of h versus Betweenness B for a) Moby Thesaurus II network and b) Yeast network. Figure 3. Log-log dispersion plot h versus Eigenvector centrality E for the Moby Thesaurus II. Inset: Linear scale, notice the several words with high E but low h. Hirsch index as a network centrality measure 5 In Figure 3 we give the dispersion plot for the h-index versus the Eigenvector centrality E for the thesaurus network. In the high E regime the maximal h values seem to be bounded by h ∝ E, like in the h versus D plot. We observe several nodes with high E but relatively low h (see Inset). Examining individually these nodes, we find that h seems to outperform E in the ranking task, since words with high h also have high E and are basic and important polysemous words. In contrast, terms with high E can have high or low h. Those with low h are mostly phrasal verbs or multiple word expressions derived from the words with high h. It is difficult to quantify the quality of some ranking list, but the above effect is very clear, as can be observed in Table 1 that shows the top 25 words ranked by h and E (the same occur for other high E and low h words). Figure 4. Log-Log dispersion plot of h versus E for the Yeast network. The h and E centralities are well correlated for E > 0.2 where there is a h ∝ E bound for the highest h values. Inset: linear scale, notice the cluster of high h but low E ribosome proteins. In the case of the Yeast protein network (see Figure 4) we observe a strong correlation between h and E for E > 0.2. We notice that this regime is the relevant one for ranking purposes, say in the ranking of WWW pages or, in our case, to detect the most important proteins. The highest h seem also to be bounded by a h ∝ E behavior. We also observe a detaching cluster of nodes with low E and moderate h (see Figure 4). It is very interesting that all these nodes seem to pertain to ribosome proteins, meaning that the h index carries information that can be useful for detecting modules of functionally related proteins. This will be studied in detail elsewhere. Hirsch index as a network centrality measure 6 Table 1. Top 25 words ranked by Hirsch centrality (left) and by Eigenvector centrality (right).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Influence of Location on Nodes’ Centrality in Location-Based Social Networks

Nowadays, due to the widespread use of social networks, they can be used as a convenient, low-cost, and affordable tool for disseminating all kinds of information and data among the massive users of these networks. Issues such as marketing for new products, informing the public in critical situations, and disseminating medical and technological innovations are topics that have been considered b...

متن کامل

Online Community Influence: A Study Using the Hirsch Metric and Social Network Analysis

This study looks at small to medium sized online community (OC) and tries to identify ways to measure impact of the contributions of the users of the OC. OC’s are dependent on contributions of their users to maintain the health of the OC. Measuring the health of an OC by identifying those users that have most influence and thus create more activity and finally more people the visit the OC is an...

متن کامل

Topological structure and the H index in complex networks.

The generalized H(n) Hirsch index of order n has been recently introduced and shown to interpolate between the degree and the K-core centrality in networks. We provide a detailed analytic characterization of the properties of sets of nodes having the same H(n), within the annealed network approximation. The connection between the Hirsch indices and the degree is highlighted. Numerical tests in ...

متن کامل

C-index: A weighted network node centrality measure for collaboration competence

This paper proposes a new node centrality measurement index (c-index) and its derivative indexes (iterative c-index and cg-index) to measure the collaboration competence of a node in a weighted network. We prove that c-index observe the power law distribution in the weighted scale-free network. A case study of a very large scientific collaboration network indicates that the indexes proposed in ...

متن کامل

Lobby index in networks

We propose a new node centrality measure in networks, the lobby index, which is inspired byHirsch’s h-index. It is shown that in scale-free networkswith exponentα the distribution of the l-index has power tail with exponent α (α + 1). Properties of the l-index and extensions are discussed. © 2009 Elsevier B.V. All rights reserved. Efficient communication means high impact (wide access or high r...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010